Global Analytics in the Face of Bandwidth and Regulatory Constraints

نویسندگان

  • Ashish Vulimiri
  • Carlo Curino
  • Brighten Godfrey
  • Thomas Jungblut
  • Jitendra Padhye
  • George Varghese
چکیده

Global-scale organizations produce large volumes of data across geographically distributed data centers. Querying and analyzing such data as a whole introduces new research issues at the intersection of networks and databases. Today systems that compute SQL analytics over geographically distributed data operate by pulling all data to a central location. This is problematic at large data scales due to expensive transoceanic links, and may be rendered impossible by emerging regulatory constraints. The new problem of Wide-Area Big Data (WABD) consists in orchestrating query execution across data centers to minimize bandwidth while respecting regulatory constaints. WABD combines classical query planning with novel network-centric mechanisms designed for a wide-area setting such as pseudo-distributed execution, joint query optimization, and deltas on cached subquery results. Our prototype, Geode, builds upon Hive and uses 250× less bandwidth than centralized analytics in a Microsoft production workload and up to 360× less on popular analytics benchmarks including TPC-CH and Berkeley Big Data. Geode supports all SQL operators, including Joins, across global data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of a Method for Partitioning and Offloading Web-based Applications in Mobile Systems with Bandwidth Constraints

Computation offloading is known to be among the effective solutions of running heavy applications on smart mobile devices. However, irregular changes of a mobile data rate have direct impacts on code partitioning when offloading is in progress. It is believed that once a rate-adaptive partitioning performed, the replication of such substantial processes due to bandwidth fluctuation can be avoid...

متن کامل

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Hybridization of Facial Features and Use of Multi Modal Information for 3D Face Recognition

Despite of achieving good performance in controlled environment, the conventional 3D face recognition systems still encounter problems in handling the large variations in lighting conditions, facial expression and head pose The humans use the hybrid approach to recognize faces and therefore in this proposed method the human face recognition ability is incorporated by combining global and local ...

متن کامل

Regulatory and Biosafety Challenges for Vaccines

The global regulatory plan for vaccines provides a unique opportunity to develop safe and effective ones with assured quality. Methods used by regulators address challenges of new products and technologies and also increase understanding of benefits and risks of existing products. First, the laboratory-based regulatory sciences evolve correlates of immunity and safety; or improve the product ch...

متن کامل

Jointly power and bandwidth allocation for a heterogeneous satellite network

Due to lack of resources such as transmission power and bandwidth in satellite systems, resource allocation problem is a very important challenge. Nowadays, new heterogeneous network includes one or more satellites besides terrestrial infrastructure, so that it is considered that each satellite has multi-beam to increase capacity. This type of structure is suitable for a new generation of commu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015